50_timeseries_analysis.ipynb

This script provides an EDA and some visualisation for the generated timeseries of the earthquakes dataset. The main parts of the routines were developed in previous courses at the University of London by the same author (Mohr, 2021, 2023, 2024a) and have been further refined to meet the needs of this MSc thesis project/research. However, the code has been updated to comply with the latest requirements and package interdependencies. Some comments will be included in this Jupyter Notebook, and the code contains several inline comments. For details on the project/research itself, refer to the appropriate document.

References (for this script)

Mohr, S. (2021) Regional Spatial Clusters of Earthquakes at the Pacific Ring of Fire: Analysing Data from the USGS ANSS ComCat and Building Regional Spatial Clusters. DSM020, Python, examined coursework cw1. University of London.

Mohr, S. (2023) Clustering of Earthquakes on a Worldwide Scale with the Help of Big Data Machine Learning Methods. DSM010, Big Data, examined coursework cw2. University of London.

Mohr, S. (2024a) Comparing Different Tectonic Setups Considering Publicly Available Basic Earthquake’s Data. DSM050, Data Visualisation, examined coursework cw1. University of London.

History

250111 Generation of script, loading and formatting timeseries, timeseries statustics, plotting timseries,
       technical basis (ACF, PACF, stationarity, differenciating)
250112 Plotting ACF and PACF (function), enhancing analysis for stationarity and differenciating, 
       function for stattionarirty analysis, moving all functions to shared_procedures.py, 
       first tests with crosscorrelation, refining CCF(ts1,ts2) and CCF(ts2,ts1), prepare workflow basics,
250113 TOPn CCF for documentation and comparison per function top_max_min_indices_as_dataframe,
       adding information to figures, adding ci bands, combine everything in function calculate_ccf_results,
       make functions for CCF, move all function to shared_procedures
250114 Rename 'size' to 'count', completely remake show_crosscorrelation_results by adding confidence intervalls,
       re-organize workflow, group important analysis (so far) and clean the code and workbook, no need to make
       a dictionary or something similar due to unessessary work (!)
250115 Added ACF, PACF monthly plots, use scatterplot and stem as an alternative, add more custom parameters,
       seasonal component analysis for ActiveEruptions, adding first tests with FFT
250116 Adding information about timeseries continuity and ruptures, Fourier Analysis, FFT for ActiveEruptions,
       xlimits for FFT figures, finalise FFT, improve stemplots for CCF
250117 Boxcox transformation, Lejung test, HET testing
250118 Make same workflow everywhere, add all data so far, add STFT analysis, show CCF coeff. for FFT 
       Highpower Freqs respective specific lags to compare them with the FFT analysis to decide on their 
       importantance (FFT combined with CCF), check complete notebook for succesfull processing and library 
       imports and make some cleanups using only really needed parts
250121 Switching to parameters.py, adding selectable cluster directory, changing all directories, 
       saving complete notebook as experimentation protocol to cluster directory, add scope and cluster to 
       figure titles, saving CCF figures to data_dir_cluster, save Timeseries Rupture Plot, check and drop 
       non-matching values for Box Cox transformations
250122 Clean everythin, show always the same timeseries, add titles and labels where possible and adequate,
       calculate several CCF (see protocol), check everything, duplicate and adapt it for the GVP analysis, GOOD LUCK!
250123 Prepare everything for creating the figures and protocols (establish test protocol), run gvp_1000 analysis
250124 Update workflow for CCF, add all selected timeseries everwhere, cleanup everything again, make one single
       50_* workflow script for all scopes, analysing scope gvp_1000 and studyarea_1000
       ==> scope studyarea_1000 and scope gvp_1000 are clean and results are saved!

Todo

./.

Preparing the environment

System information

Setting PATH correctly

Loading libraries

Setting the script environment

Loading timeseries

EDA

Statistics and basic information

Plot timeseries

Daily

Weekly

Monthly

Yearly

Timeseries continuity and ruptures

Yearly

ACF and PACF analysis

Fourier Transformation (Frequency & Period Domain) and STFT

Seasonal decomposition (one feature only)

From here on it's only manually ...

Crosscorrelation analysis CCF

incl. imputation, handling heteroskedasticity, and handling stationarity.

(1) Yearly 'Earthquakes Count' vs. 'Starting Eruptions'

(2) Yearly 'Released Earthquake Energy' vs. 'Erupted Volume'

(3) Yearly 'Released Earthquake Energy' vs. 'Starting Eruptions'

(4) Yearly 'Max Earthquake Magnitude' vs. 'Max Max VEI'

End of script

Appendix

Save notebook as experimentation protocol